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This article examines the problem of compressing a uniformly quantized inde- 
pendent and identically distributed (IID) source. We present a new compression 
technique, bit-wise arithmetic coding, that assigns fixed-length codewords to the 
quantizer output and uses arithmetic coding to compress the codewords , treating 
the codeword bits as independent. We examine the performance of this method and 
evaluate the overhead required when used block-adaptively . Simulation results are 
presented for Gaussian and Laplacian sources. This new technique could be used 
as the entropy coder in a transform or subband coding system. 


I. Introduction 

Many lossy data compression systems consist of sep- 
arate decorrelation and entropy coding stages. In such 
schemes, the source data are transformed by some tech- 
nique (e.g., discrete cosine transform or subband cod- 
ing) with the goal of producing decorrelated components. 
Each component is independently scalar quantized and 
the quantizer output is losslessly compressed. Frequently 
each component is modeled as a sequence of IID random 
variables. This model motivates the topic of this article: 
block-adaptive compression of uniformly quantized sam- 
ples from an IID source. 

The traditional approach to the problem of efficiently 
encoding the quantizer output symbols is to use a variable- 
length code, assigning shorter codewords to the more prob- 
able symbols. The well-known Huffman coding technique 
gives the optimal such assignment. This method, however, 
has some performance limitations. Since each source sam- 
ple must be mapped to a codeword of length at least 1, the 
rate of a Huffman code can never be less than 1 bit/sample, 
no matter how small the entropy. The redundancy of Huff- 
man codes, the difference between rate and entropy, has 


been studied and bounded by many researchers, e.g., [4]. A 
common approach to reducing the redundancy at low en- 
tropy is to combine Huffman coding with zero-runlength 
encoding or to encode groups of symbols rather than indi- 
vidual symbols. In this article, we present an alternative 
solution. 

Another problem with Huffman codes is that in block- 
adaptive situations there may be significant overhead costs 
(the extra symbols required to identify to the decoder the 
code being used) [1,5]. Finally, if the buffer overflows (say, 
if the source is much less compressible than anticipated), 
we may be forced to discard source samples. 

With these problems in mind, we introduce a new tech- 
nique, bit-wise arithmetic coding. The solution we pro- 
pose is to assign a fixed-length binary codeword to each 
output symbol in such a way that a zero is more likely 
than a one in every codeword bit position. We then take 
the codewords corresponding to several adjacent quantizer 
output symbols and use a binary arithmetic encoder 1 to 


1 The term “binary encoder” is intended as shorthand for “binary- 
input binary-output encoder.” 
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encode the first codeword bit for each of these symbols. 
We repeat this procedure for each group of codeword bits, 
treating each group independently from the others. This 
technique can be thought of as a simple progressive trans- 
mission scheme using an arithmetic coder to independently 
encode each level of detail. It turns out that this technique 
is often surprisingly efficient, despite the fact that interbit 
dependencies are ignored. 

In Section II, we define the uniform quantizer parame- 
ters. In Section III, we analyze the block-adaptive binary 
arithmetic encoder that will be used as part of the bit- 
wise arithmetic encoding. We make use of this analysis in 
Section IV, where we examine in detail the the bit-wise 
arithmetic encoding procedure. We present performance 
results in Section IV. C. 

II. Uniform Quantizer 

For several reasons we limit our investigation to the 
uniform quantizer, not the least of which is simplicity of 
implementation and analysis. The uniform quantizer of- 
ten outperforms the Lloyd-Max quantizer in terms of rate- 
distortion performance (see, e.g., [2]); more important, if 
we do not know the source statistics a priori, it may be dif- 
ficult to design a more suitable quantizer. The proposed 
new method could also be used with a nonuniform quan- 
tizer, but the analysis would be less tractable. 

A continuous source with probability density function 
(pdf) f(x) and variance a 2 is quantized by a uniform quan- 
tizer having b bits and bin width 6a , as in Fig. 1. The 
quantizer output is an index i in the range 1 — 2 b ~ l < 
i < 2 6-1 , identifying which of the 2 b intervals contains 
the source sample. A source sample lying in [XJ_i,T t ) has 
reconstruction point iSa, where the quantizer thresholds 
are TJ = (i + ^)8a, for \i\ < 2 6 ” 1 — 1, and T 2 *>- 1 = oo, 
T_ 2 6-i = — oo. We could obtain a lower distortion by us- 
ing reconstruction points that are equal to the center of 
mass [with respect to /(z)] of each interval, but since we 
wish to use this quantizer in adaptive situations, we cannot 
generally compute this quantity. 

Note that the quantizer is asymmetric: There is a re- 
construction point at the origin, and since there is an even 
number of points, there is an “extra” reconstruction point, 
which we arbitrarily choose to place on the positive side. 
The obvious alternative, a symmetric quantizer that has 
no reconstruction point at the origin, results in poor per- 
formance when 6 is large relative to a 2 . 

Let pi denote the probability that the quantizer output 
is index i, and let cfi denote the contribution to the mean 
square error (MSE) from the interval [ ): 


T, 

Pi = / f{x)dx 

T,_ 1 

( 1 ) 

T % 

d{ — j (x — iaS) 2 f{x)dx 

T,_! 

The MSE is equal to d{. It will be convenient to let V 
denote the discrete distribution {pi}^Li_ 2 b-\ • 

In Figs. 2(a) and (b) we plot the resulting rate- 
distortion curves (computed analytically) for Gaussian and 
Laplacian sources over a wide range of 8. Note that the 
curves show optimal performance when the uniform quan- 
tizer is used, rather than the theoretical rate-distortion 
limit, i.e., the rate shown is the entropy of V . The large 
range of 6 causes the peculiar loop in the curve: when 8 2 b 
becomes too small relative to cr 2 , the performance is poor 
because the overload bin probabilities become large while 
their reconstruction points are too close to zero. This sit- 
uation should be easily avoidable in practice, as we would 
expect the source range to be finite and known in advance 
because of hardware constraints. 

In general we will assume 


= Pi , 0 < i < 2 b ~ l - 2 

( 2 ) 

Pl_ 2 b-i = P2 b ~ l T P2 b ~ l -\ 

( 3 ) 

Pi is nonincreasing in |i| 

( 4 ) 


These conditions are true when f(x) is symmetric about 
x = 0 and nonincreasing with |x|, and 6 is not unreason- 
ably small. 

For sufficiently large 6, the rate-distortion curve is vir- 
tually independent of 6, as can be seen in Figs. 2(c) and 
(d), where we plot rate-distortion curves for b — A and 
b — 8 for Gaussian and Laplacian sources. Increasing b has 
the effect of increasing the useful range of 8 and lengthen- 
ing the useful portion of the rate-distortion curve. 

III. Block-Adaptive Binary Arithmetic 
Encoding 

In this section, we analyze the operation of the block- 
adaptive binary arithmetic encoder that will be used as 
part of the bit-wise arithmetic encoding procedure. 
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A. Binary Arithmetic Encoder Operation 

It is well known that a binary arithmetic encoder that 
is well-tuned to the source can achieve a rate quite close to 
the source entropy. Our goal in this section is to determine 
the performance we can expect from an encoder that may 
not be well-tuned. For more details on arithmetic coding 
see [6,7,8]. 

A binary arithmetic encoder with parameter P, the an- 
ticipated probability of a zero, maps an AMength sequence 
of bits s into an interval [/, r) C [0, 1] whose width is 

r — l — P NF (l — P) w(1-F) (5) 

where F is the fraction of bits in s that are zero. Ideally, 
we would like to have P = F, but this might not always 
be possible. 

Example 1. Suppose P — 5/13, s = 01. Initially, 
[I,r) = [0,1). We divide this interval into [0,5/13), 
[5/13,1). Note that this first interval has width of P = 
5/13 of the total interval. On receiving s i = 0, we as- 
sign [/, r) = [0,5/13) because the symbol with anticipated 
probability of 5/13 was received. Again we divide [0,5/13) 
into [0, 25/169), [25/169,5/13). After receiving s 2 = 1, we 
assign [l,r) = [25/169, 5/13). Note that this assignment 
satisfies Eq. (5). ^ 

A A'-bit output sequence from the encoder maps to 
an interval [i2~ K ,(i + l)2~ K ) for some 0 < i < 2 K - 1. 
For our application, since N is known to the decoder, the 
encoder must specify the largest interval of this form that 
is contained in [I,r). That is, the encoder must use as 
few bits as possible to identify to the decoder a sequence 
beginning with s. 

Example 2. Continuing Example 1, after calculating 
[/, r) = [25/169, 5/13), the encoder must find an interval of 
the form [i2~ K , (i + 1)2~ K ) such that [i2"*,(t+l)2"*) C 
[/, r). One such interval is [1/4, 3/8), which corresponds to 
output sequence 010 (because this interval is equal to the 
set of numbers having binary expansion beginning with 
0.010). We will verify in Example 3 that this is in fact the 
encoder output sequence. 

Knowing P, the decoder realizes that [1/4, 3/8) C 
[525/2197,5/13) and that this latter interval corresponds 
to input sequence Oil. Since the decoder also knows that 
N - 2, it takes the first two bits, giving output 01, which 
is in fact the encoder input sequence. & 


We continue with the derivation of the encoder rate. 
We can write [/,r) = [j2~ J + L2~ J , j2~ J -f R2~ J ) where 
0 < £ < 1/2 < P < 1 for integers j and J . That is, 

[l,r)C[j2- J ,(j + l)2~ J ) (6) 

for maximum J and some j. The first J bits of the 
output sequence map to the interval \j2~ J ,(j + 1)2~ J ), 
and the remaining bits correspond to a subinterval of 
[L,R). Comparing interval widths in Eq. (6), we find 
that 2~ J > r — / = P NF (l - P) N ( l ~ F \ which implies 
J > Nh(P , F) where 

h(P, F) = -F log 2 P - (1 - F) log 2 (l - P) 

which is a line tangent to the binary entropy function 
n(F) at the point F = P. If we are very lucky, then 
\j2~ J ,(j + 1)2" J ) = [/,r) exactly and we are done encod- 
ing, in which case the rate is J/N , so the encoder rate 
Parith (the number of output bits divided by the number 
of input bits) satisfies Parith ^ h(P,F). 

Usually we will not be so lucky, and we must send ad- 
ditional bits. In the worst case, L ^ 0 and P / 1, in 
which case we can assign these final bits to be 10 n , which 
corresponds to [1/2, 1/2 -f 2“ n_1 ). 

Example 3. Continuing Example 2, we can write 
[/, r) = [25/169,5/13) ^ [Ox 2" 1 + (50/169) x 2" 1 ,0 
x 2 _1 + ( 10/ 13)2“ 1 ) , i.e., j = 0, J = 1,L = 50/169 < 
1/2 < R ~ 10/13. The first output bit is 0 (correspond- 
ing to j = 0). The remaining bits, 10 n (corresponding to 
[1/2, 1/2 4- 2" n_1 ) ) must map to a subinterval of [L, P), 
which implies n — 1, so the output sequence is 010. O 

The value n must be sufficiently large that 1/2 
_!_ 2” n_1 < P, which implies 

2" n_1 < R- 1/2 < R- L = 2 J P NF {1 - P)^ 1 -^) 

making use of Eq. (5) and the fact that 2 ^ ( R — L) = r — l. 
Alternatively, we may use 01 n instead of 10 n if this gives 
smaller n. Consequently, 

n < [- log 2 [2 j P wf (1 - P)^ 1 " F >]] - 1 

= \Nh(P,F)] - J -l 
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The total number of encoder output bits is J + 1 + n, so 
we have 


ftarith = h(P, F) + 


(* + 1 / 2 ) 

N 


h(P ,F) < /? anth < 


r NhjRFj] 

N 


which suggests the close approximation 


At 9-bit resolution, simulations indicate that k has an ex- 
pected value of approximately 0.32 bits, and standard de- 
viation of 1.01, nearly independent of P, F, and N 3 Thus, 
as an approximation to the rate we use 


A^arith ~ h(P y F) + - — (8) 

B. The Effect of Finite Precision 

The bound of Eq. (7) is valid when the encoder can per- 
form arithmetic with arbitrary precision. In any practical 
system, however, we are limited in our ability to represent 
the interval endpoints /, r. Whenever / > 1/2 or r < 1/2, 
we can transmit an output bit and rescale the interval (re- 
assign / and r) in the obvious way so that we can make the 
most of the available resolution. Failing to do this would 
degrade performance and severely limit the length of in- 
put sequences that can be encoded. A consequence of this 
rescaling is that / E [0, 1/2) and r £ (1/2, 1], which means 
that at 9 bits of resolution, which is the resolution used for 
all simulations and discussions that follow, the values of / 
and r used by the encoder are always multiples of 2 -10 . 

The effect of the finite resolution is that P, the antic- 
ipated probability of a zero, is almost never represented 
exactly. 2 In fact, P e ff, the effective value of P, varies as 
the algorithm progresses. At each iteration, the interval 
between / and r is divided into units of length 2 -10 , and the 
number of these units is equal to the resolution available to 
represent P. For example, in the worst case, / = 1/2— 2" 10 
and r = 1/2 + 2 -10 , producing P eff = 1/2. Note that we 
can never allow P e ff to be zero or one because this would 
result in an input symbol having no effect on the inter- 
val, making decoding impossible. Unfortunately, bounds 
on rate derived from bounds on P e ff are uselessly weak. 

Because of the finite resolution, the final interval width 
is not exactly P NF (l — P) Ar ( 1_F ) as it is for the arbi- 
trary precision system, but rather n,: 3>= o^ ffl rWi 
(1 — P e ff_,). Thus, the rate depends not only on P, P, 
and the resolution available, but also on the particular in- 
put sequence since P e f[ t depends on «i, $2, • ■ • , s*_i. The 
consequence is that Eq. (8) is a slightly optimistic estimate 
of the rate. 

Let k equal the number of extra encoded bits result- 
ing from the limited resolution, when compared to the se- 
quence length predicted by Eq. (8), so that 

2 There are some trivial exceptions to this, such as when P — 1/2. 


flarith *h(P y F) + 0.82/N (9) 

for a system with 9-bit resolution. Increasing the resolu- 
tion should have the effect of reducing the variance and 
expected value of k. Figure 3 gives an example of encoder 
performance when P is fixed. 

C. Overhead 

We would like to use binary arithmetic encoding block- 
adaptively to transmit a sequence of N bits, i.e., the en- 
coder output sequence is preceded by overhead bits that 
identify to the decoder the value of P being used. By 
using log 2 N bits of overhead, we could specify P = F ex- 
actly, but by using fewer bits we can exchange accuracy for 
lower overhead. In this section, we will explore this trade- 
off, showing how to find the optimal number of overhead 
bits. 

Assume for now that we have a fixed number of over- 
head bits m. We select an ordered set of M = 2 m probabil- 
ities {pi , P 2 , . . - , Pm } , known to the encoder and decoder in 
advance, that can be used as values for P. We first show 
how to find the optimal assignment of these probability 
points. 

Given the set {pi,p 2 , - ■ . , Pm}, the encoder would 
choose to use the pi that minimizes the rate, i.e., the 
Pi that minimizes h(p iy F). As illustrated in Fig. 4, this 
amounts to using line segments to approximate the bi- 
nary entropy function. Let ti denote the solution to 

KPi^i) = h(pi+i,ti) and define A< = h(p l ,t i ) - H(U), 
which is the redundancy (additional rate) at this “corner” 
point ti. Clearly if we choose the p,’ s to minimize max, A, , 
then we minimize the maximum redundancy over all pos- 
sible values of F. 

Suppose that for some i, we have A,_i > A*, as in 
Fig. 4. Then by decreasing p,, we can decrease A t _j and 
increase A, . From this argument we can see that the as- 
signment of the pi is optimal when A, is the same for all 


3 When P is very close to 0 or 1, the encoder rate is less well behaved. 
For example, if P < 2 -10 , then clearly P eff will always exceed P. 
We ignore these extreme cases. 
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i. We call this optimal quantity A *(M). Thus, we want 
to find M, the minimum number of line segments required 
to approximate the binary entropy function to within A* 
everywhere. 

Given A*(M), we find that p\ — 1 — 2 _A • Given 

Pi, we find by solving 

h( Pi ,t i )-'H(t i ) = A*(Af) 

Given we find /? l+ i by solving 

i.e., we solve the same equation in the other direction. This 
procedure can be used to find the optimal set of pi and 
ti given A *(M), or in an iterative procedure to compute 
A *(M). We can also take advantage of the fact that the 
optimal pi must be symmetric about 1/2. In Table 1, 
A*(M) is given for several values of M. 

To find the relationship between A* and M for large M, 
let w(p) denote the spacing between adjacent probability 
points in the neighborhood of p so that the corner point 
is t&p + wf 2. At this point, 

w 2 

A- » *(,,„ + »/2) - «(P + »/!) * ^7)1^2 

using the first three terms of the Taylor series expansion 
of 7i{p). Now 1 /w(p) is the density of probability points, 
so 

l i 

M ~ / -w(p) dp ~ V8A* In 2 / x/pO - P) 

o o 

7 r 

~ x/8A* In 2 
thus for large M, 

(sT^) (10) 

In Fig. 5 we show the exact and approximate relationship 
between M and A* . 


The encoding operation is straightforward. The en- 
coder computes F by counting the number of zeros in the 
input sequence. The largest integer I satisfying */_ i < F 
gives the optimal pi , which is the parameter used in the 
encoding procedure. The encoder uses m bits to identify I , 
followed by the arithmetic encoder output sequence. This 
procedure guarantees that 

h{pf,F)<n{F) + A*( 2 ") 

The rate (including overhead) required to block- 
adaptively transmit a sequence of A bits is approximately 

H(F) + m/N + A*(2 m ) + 0.82/A (11) 

where 'H(F) comes from the source uncertainty; m/N 
comes from the m overhead bits used to identify pi) 
A*(2 m ) is a result of using m bits to specify F approx- 
imately instead of using log 2 N bits to specify F ex- 
actly (this amounts to a worst-case assumption); and 
0.82/A comes from the finite resolution of the encoder 
[see Eq. (9)]. 

Given A, the optimal number of overhead bits is 
m*(A) = min + 0.82/A + A*(2 m ) + m/N } 

m 

= min- 1 {A*(2 m ) + m/TV) (12) 

m 

which is tabulated in Table 2. 

Using Eqs. (10) and (12), we find that for large m, the 
optimal value of m satisfies 

0 = A[ A *(2 m ) + m/N] « 1 - 7T 2 2- 2 - 2m 

which gives 

m*(N) « ^ log 2 TV + log 2 7T - 1 ( 13 ) 

and 

A * (V 1 *^ m — - (14) 

V ) 2N In 2 v ’ 
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Substituting into Eq. (11), we find that the rate of a block- 
adaptive binary arithmetic encoder, including overhead, is 
approximately 

+ \ 1o ® 2 N + 0.18 + log 2 7T + ( 15 ) 

IV. Bit-Wise Arithmetic Coding 

A. Operation of the Bit-Wise Arithmetic Encoder 

We now use the results of Section III to analyze the per- 
formance of bit-wise arithmetic encoding. First we outline 
the encoding procedure. 

Each quantizer output symbol is mapped to a 6-bit 
codeword. The first bit indicates the sign of the quantizer 
reconstruction point. 4 The remaining bits are assigned 
to quantizer levels in increasing lexicographic order as we 
move away from the origin. Figure 6 illustrates this map- 
ping for 6 = 4. Because of Eqs. (2)-(4), a zero will be more 
likely than a one in every bit position. 

Codewords corresponding to N adjacent source sam- 
ples are grouped together. The N sign bits of the code- 
word sequence are encoded using the block-adaptive bi- 
nary arithmetic encoder analyzed in Section III. Then 
the N next- most-significant bits are encoded, and so on. 
This can be viewed as a simple progressive transmission 
system— each subsequent codeword bit gives a further level 
of detail about the source. Each bit sequence is encoded 
independently— -at the zth stage the arithmetic coder cal- 
culates (approximately) the unconditional probability that 
the zth codeword bit is a zero. 

The obvious loss is that we lose the benefit of interbit 
dependency. For example, the probability that the second 
bit is a zero is not generally independent of the value of 
the first bit, though the encoding procedure acts as if it 
were. Huffman coding does not suffer from this loss, which 
we examine in Section IV. B. 

The advantage is that for many practical sources, this 
technique has lower redundancy than Huffman coding, be- 
cause the arithmetic coder is not required to produce an 
output symbol for every input symbol. Also this scheme 
is relatively simple 5 and has some advantages in terms 
of overhead: because the number of codewords is 2 6 , the 

4 Or, to be precise, the first bit indicates whether the reconstruction 
point is positive. 

5 To the extent that a binary arithmetic coder is simple. 


overhead of block-adaptive Huffman coding increases ex- 
ponentially in 6 unless we are able to cleverly exploit ad- 
ditional information about the source [5]. By contrast, 
the overhead required for bit-wise arithmetic encoding in- 
creases linearly in 6 because the codeword bits are treated 
independently. 

Another advantage is that this technique gives us a sim- 
ple means of handling situations where we are rate con- 
strained (or equivalently, buffer-constrained): We simply 
encode the blocks of N bits until the allocated rate is ex- 
hausted (or the buffer is full). The distortion is automati- 
cally reduced for “more compressible” sources — when the 
most significant bits can be efficiently encoded, we are able 
to send additional (less-significant) bits, so the encoder 
resolution increases automatically. This would mean, for 
example, that a block having 6-bit resolution might be 
followed by a block having only 8-bit resolution. The ad- 
vantage is that we hope to prevent any sample from having 
zero-bit resolution. 

The obvious question is whether the gains offset the 
losses. Given 6,6, and /(x), we can use Eq. (1) to compute 
V , the distribution on the quantizer output symbols, and 
use this result to compute 7 r,*, the probability that the zth 
bit is a zero. For example, for 6 = 4, 



■1 11111110000000 0 ] 
0000111111110000 
0011001111001100 
- 0101010110101010 . 


liiiiiiin poi 
122210000 Pi 
12 10 12 10 0 
111111110J ‘ 

LpS-l 

using the symmetries of the p,- described in Section II. It 
turns out that 7r x = (1 + p 0 )/ 2, = Wi - p 2 *->* The 

rate obtained is (assuming for the moment that we have 
an idealized system where we may neglect overhead) 

b 

i=i 



150 



For the Gaussian and Laplacian sources, this result can 
be compared to Huffman coding in Figs. 7 and 8. Over 
the useful range of the quantizer, the rate of the bit-wise 
arithmetic coding scheme is quite close to the entropy. 

B. The Redundancy of Bit-Wise Transmission 

We have already examined some of the sources that con- 
tribute to the rate of bit-wise arithmetic encoding: over- 
head bits, finite precision arithmetic, and the use of ap- 
proximate rather than exact representation of F. We now 
examine the most obvious redundancy component: the 
added rate that results from treating each bit indepen- 
dently. 

Let Pi be the random variable that is equal to the value 
of the ith bit of the codeword corresponding to the quan- 
tizer output. The source entropy is then the entropy of 
the Pi ) H(PiP2 - • • fit), but the independent encoding of 
the bits results in a rate of H(P\) 4- i/(/? 2 ) + ■ • • H(Pb)- 
Thus, the redundancy is 

b 

i = l 

b 

1=1 

+ ■--#(/?„ \0n-10n-7---fil)] 

i 

= '52W,0i-ifii-2---0i) (16) 

1 = 2 

where I is the mutual information function. So, for ex- 
ample, if 6 = 2, the redundancy is equal to the mutual 
information between P\ and /? 2 - 

Of course, 71 > 0, with equality if and only if the Pi 
are independent. This bound is tight — zero redundancy 
occurs, for example, if the pi are distributed according to 
the two-sided exponential distribution pi = which 

is a distribution suggested in [3] as a model for certain 
real-world sources. The exponential distribution is close 
to the distribution obtained for a Laplacian source, which 
explains why bit-wise arithmetic coding works well for this 
source. 

A greedy assignment of bits results in almost exactly 
the same mapping from quantizer output symbol to code- 
word. In the first bit position, assign a 0 to the 2 6-1 


indices having the largest p», and a 1 to the others. In the 
second bit position, among the quantizer output indices 
having the same value in the first bit position, assign a 0 
to the indices having the largest pi , and so on. In 
this manner, the sign bit is the last codeword bit. 

It should be noted that other codeword assignments, for 
example, assigning codewords so that Hamming weight is 
strictly nonincreasing in |z|, can sometimes give lower re- 
dundancy. Unless the distribution of the p x is known a 
priori, we cannot in general determine the optimal assign- 
ment. The codeword assignment proposed here has the 
advantages of symmetry and usefulness from a progressive 
transmission or buffer-constrained standpoint. 

If we relax the assumptions of Eqs. (2)— (4) , then the 
pathological case po = P 2 6 - 1 = 1/2 gives the maximum 
possible redundancy of 7£ ma x = 6—1. We would like 
to have a tight upper bound on redundancy when we 
maintain Eqs. (2)-(4). For 6 = 2, it is simple to ver- 
ify that the maximum redundancy occurs when V = 
{P— l , Po i Pi i P 2 } = { 1/3, 1/3, 1/3, 0}, which gives redun- 
dancy of log 2 3 — 4/3 « 0.252. For larger 6, it becomes 
more difficult to determine analytically what distribution 
gives maximum redundancy. 

Example 4. If 6 = 3, the restrictions of Eqs. (2)-(4) 
imply that any valid distribution V can be written as a 
convex combination of the (mostly) uniform distributions 

Vi = { 0 , 0 , 0 , 1 , 0 , 0 , 0 , 0 } 

V 2 = {0,0, 1/3, 1/3, 1/3, 0,0,0} 

?3 = (0,1/5, 1/5, 1/5, 1/5, 1/5, 0,0} 

V 4 = (1/7, 1/7, 1/7, 1/7, 1/7, 1/7, 1/7,0} 

V 5 = (1/7, 1/7, 1/7, 1/7, 1/7, 1/7, 1/14, 1/14} 

Thus, we wish to maximize the redundancy function of 
Eq. (16) over the convex hull of {V \ , 7*2, 7*3, 7*4, 7*5}- Sim- 
ulations suggest that the maximum occurs at 7*3, which 
gives redundancy of 2/5[51og 2 5 — 31og 2 3 — 6] ft; 0.342. 
Unfortunately, 7Z is not convex U in P, so we are not cer- 
tain that for arbitrary 6 maximum redundancy occurs for 
some uniform distribution. 

We conjecture that for any 6, the maximum redundancy 
subject to Eqs. (2)-(4) occurs for some uniform f(x ). If 
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this conjecture is correct, then to find a bound on redun- 
dancy we examine redundancy in the limit b — ► oo. Con- 
sider what happens when the quantizer range is [—1/2, 1/2] 
and f(x) is uniform over [-W/2, Wj 2], for some 0 < V V < 
1, as in Fig. 9. The lines and spaces in the figure replace 
the codeword assignment matrix: the lines denote a one in 
the corresponding bit position and the gaps denote a zero 
(compare to Fig. 6). Note that in the limit the codeword 
assignment is symmetric about x = 0. 

For a large fixed 6, V , b u , the redundancy for a uniform 
distribution is 


VJ < we can find a uniform f(x) and finite b 

producing redundancy 7 Z* or higher. We conjecture that 
in general 

H < 11™ (W*) « 0.34544 

is a tight upper bound for any pdf producing V satisfy- 
ing Eqs. (2)-(4). It is interesting that this bound is in- 
dependent of b , while without any restrictions on V the 
redundancy can be as large as 6 — 1. 


b b 

K(w) = n ( T <) - - n - iog 2 W 

t=l t=l 

using the fact that H(V) = \og 2 (W2 h ) = 6 4- log 2 W , 
which is a consequence of the uniform distribution. Given 
i, 1 — 7 Tj is equal to the sum of the lengths of the dark- 
ened line segment portions divided by W. We ignore 
the sign bit because 7r slgn = 1/2, so this bit makes no 
contribution to the redundancy in the limit. Examin- 
ing Fig. 9, we can see that the interval [— W/2 } W/2] 
will always contain either an integer number of line seg- 
ments or an integer number of gaps. Thus, either 7r, or 
1 — ni will be equal to [1/2 4* WT~ /( W2 X ), so 7i( > t ) = 
H ([1/2 4- W2 X ~ X \ /W2 l ) and in the limit the redundancy 
is 


C. Performance 

Including all of the overhead effects, using Eqs. (11), 
(13), and (14), the rate of the bit-wise arithmetic coder is 
approximately 


77fc>it — arith ~ H (0\02 * ‘ - 0b) + 7£ 

+ 6 L* + m ’ >(Ar j v + °' 82 

& H (0102 ■■ ■ 0b) 4- 7£ 



1 

2 In 2 


+ 0.18 4- - log 2 N 4* logo 7T 


n?(W) = - log, W- 22 fi(W) (17) 

i-l 


ff(0102---fi) + X + 


N 


2.55 + - log 2 N 


where 


(18) 


fi(W) = 1 -H 


Al/2- 


wy 

w¥ 


.i-l | 


Note that we can limit our analysis to the case where 
W > 1/2 without loss of generality, because %™(W/2 n ) — 
( W ) for any integer n. Figure 10 shows 7££°(W), and 
several of the /,• are shown in Fig. 11. If W — j/2 n for in- 
teger j and n, then only the first n terms in the summation 
of Eq. (17) are nonzero. 


In Fig. 12, we plot theoretical and simulated rate- 
distortion curves for the bit-wise arithmetic coder applied 
to Gaussian and Laplacian sources. Bit-wise arithmetic 
coding performs particularly well on the Laplacian source. 
Note that the increase in rate when we decrease N from 
512 to 256 is rather small. This is not surprising con- 
sidering Eq. (18). The use of smaller N implies faster 
adaptability to a nonstationary source, and also gives ad- 
vantages when using a small buffer. 


The function 1Z™ (W) attains a maximum of approx- 
imately 0.34544 near W* « 0.610711. It is difficult to 
determine an analytic expression for W* or 7Z™ (W*) y 
in part because the first derivative of 71™ (W) is discon- 
tinuous at infinitely many points in [1/2,1]. Given any 


D. Possible Refinements 

There are a few tricks that we might use to further 
reduce the rate in a practical system: 
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(1) The arithmetic encoder could adaptively estimate 
the probability rather than simply count the number 
of zeros in a sequence. This might improve adaptiv- 
ity to nonstationary sources, in addition to reducing 
overhead. 

(2) The relative frequency of zeros in a sequence, F } 
must be a multiple of 1 /N, so not all “corners” (i.e., 
the ti in Fig. 4) can be reached. Keeping this fact 
in mind, we could adjust the pi values to obtain 
a slight improvement in performance: lower A*(m) 
and perhaps lower m*(N). The rate reduction would 
probably be minuscule except at very small N . 

(3) Additional savings may be obtained by considering 
the variations in rate due to the finite precision of 
the encoder. When F is near a corner, we could 
(time and complexity permitting) encode the se- 
quence using the two nearest pi to see which pro- 
duces a shorter sequence. 

(4) We could make the pi more dense in the regions that 
are more probable. For example, if Eqs. (2)-(4) are 
satisfied, then the probability of a zero will always 
be higher than the probability of a one, so we could 
require most of the pi to be less than 1/2. 

(5) Since the decoder knows N t t/_i, and ij before it 
decodes, it knows that the sequence must contain 
between |~AT/-i] and [Ntj\ zeros. In the current 
implementation, the decoder does not explicitly ex- 
ploit this information. We could update P as we 
encode/decode, taking into account the number of 
zeros that must remain in the sequence, “like count- 
ing cards in Vegas,” comments Sam Dolinar. 


(6) We might also get a slight improvement by combin- 
ing the overhead of blocks, thus not requiring that 
M, the number of pi, be a power of two. 

(7) We could require that p\ — 0, so that if a block 
consisted of the all-zeros sequence, we could encode 
the entire sequence simply by using the m overhead 
bits to identify P = p \ . 

(8) It might be convenient to keep track of the encoder 
output sequence length during the encoder opera- 
tion, and send the data unencoded if the length ex- 
ceeds N. This corresponds to forcing one of the p t to 
be equal to 1/2. Once this happens, we might as- 
sume that all remaining bit positions are sufficiently 
random so that we are better off sending them unen- 
coded and saving the overhead. This reflects the con- 
ventional wisdom that for many real-world sources, 
quantized samples are often compressible only in the 
most-significant bits. 

V. Conclusion 

The bit-wise arithmetic encoding technique provides a 
simple method for data compression. The independent 
treatment of the codeword bits provides its main assets: 
The technique is simple, it can be used in progressive 
transmission or as a means of alleviating buffer overflow 
problems, and it has low overhead that increases linearly 
in the number of quantizer bits rather than exponentially. 
For the Gaussian and Laplacian sources, the rate is quite 
close to the entropy. The independent treatment of bits 
can also be its greatest liability — for sources where the 
codeword bits are highly correlated, the redundancy can 
be substantial. As with any data compression method, 
the usefulness of this technique ultimately depends on the 
source to be compressed. 
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Table 1 

. The optimal relationship 
between Mand A. 

Table 2. The optimal relationship between 
block length N and overhead bits m. 

M 

A *(M) 

N 

m* (N) 

2° 

1.0 

1 

0 

2 1 

0.32193 

[2,4] 

1 

2 2 

0.093506 

[5,14] 

2 

2 3 

0.025407 

[15,53] 

3 

2 4 

0.0066389 

[54, 202] 

4 

2 6 

0.0016980 

[203, 806] 

5 

2 6 

0.00045745 

[807, 2861] 

6 

2 7 

0.00010800 

[2862,12358] 

7 

2 8 

0.000027077 
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Fig. 1. Example of a pfd and reconstruction points for a four-bit uniform 

quantizer. 





RATE, bits/sample 


Fig. 2. Rate distortion for the uniform quantizer over a wide range of 5: (a) Gaussian source, b = 4; (b) Laplacian source, b = 4 
(c) Gaussian source, b = 4 and b = 8; and (d) Laplacian source, b = 4 and b = 8. 
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Fig. 3. Binary entropy, predicted and actual performance of the 9-bit binary arithmetic 
encoder as a function of Ffor all possible sequences of length N = 13 when P = 1/3. 



Fig. 4. Binary entropy and line-segment approximation. 
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Fig. 5. A* (M) and approximation (rt 2 /8 In 2) M 2 . 
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Fig. 6. Codeword assignment for the four-bit quantizer. 
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Fig. 7. Bit-wise arithmetic coding, Huffman coding, and entropy for 
Gaussian source, b = 4, without overhead. 



Fig. 8. Bit-wise arithmetic coding, Huffman coding, and entropy for 
Laplacian source, Jb = 4, without overhead. 
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-1/2 -Wf2 0 W2 1/2 



Fig. 9. Uniform distribution and codeword assignment in the limit b — «. 
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Fig. 12. Bit-wise arithmetic coding performance, including overhead: (a) Gaussian source, b = 4, N = 512; (b) Gaussian source, b = 4, 
N = 256; (c) Laplacian source, b = 4, W = 512; and (d) Lapiacian source, b = 4, A/ = 256. 
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